Re: [devel] [PATCH 1/1] imm: removed extra OpenSafSmfConfig in xml [#2762]
Ack Thanks Lennart > -Original Message- > From: Srinivas [mailto:srinivas.mangip...@oracle.com] > Sent: den 19 januari 2018 09:48 > To: Lennart Lund; Rafael Odzakow > ; ravisekhar.ko...@oracle.com; > syam.tall...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net; Srinivas > > Subject: [PATCH 1/1] imm: removed extra OpenSafSmfConfig in xml [#2762] > > --- > src/smf/config/smfsv_classes.xml | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/smf/config/smfsv_classes.xml > b/src/smf/config/smfsv_classes.xml > index 06a6f52..6423246 100644 > --- a/src/smf/config/smfsv_classes.xml > +++ b/src/smf/config/smfsv_classes.xml > @@ -349,7 +349,7 @@ > smfNodeRebootCmd > SA_STRING_T > SA_CONFIG > -SA_WRITABLEOpenSafSmfConfig > +SA_WRITABLE > > > smfInactivatePbeDuringUpgrade > -- > 2.7.4 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]
Ack from me also, with comments: * I think my major comment is that I had originally envisioned that you would use the "etcdctl lock" command (in the V3 API) and that the active SC would hold the lock for as long as it is active. The lock would not be needed for reading. Your approach of only creating the lock when you wish to change active controller could be fine though. However, you shouldn't need the lock for reading - only when you wish to update the active controller. Regarding the Watch: I think you should have the watch on the lock instead of (or in addition to) the data you are protecting. At a fail-over, the old standby would acquire the lock and wait a while to give the old active enough time to detect that a fail-over is pending (it notices that the lock has been created). The old active would then be able to remove the lock and prevent the fail-over from happening. We can look into this in the next iteration (next release) and keep it as it is for now. * You ought to utilise the test-and-set functionality in the etcd v2 protocol, in the cases where you are changing the value of a key and know (think you know) the previous value. unlock is an example of this, fail-over probably also. We could add this later but I think you should at least extend the plugin API already now, so that it takes a "previous value" parameter where applicable. * You have a try-again loop when you acquire the lock, but if the maximum number of retries have been done then you continue as if the lock was acquired successfully. It doesn't seem to be correct? * It is not obvious (to me) that no more Watch thread can be created simultaneously. Could you add a flag that keeps track of if there is an existing thread, and add assert statements checking that there is no existing thread when you call MonitorActive() to create a new one? * As Hans points out below, it seems that it is also possible that the watch thread could disappear silently in some error case. * As already pointed out by Hans, we should store our keys in some directory in the etcd database, so that the same database can be used for other purposes as well. I think the plugin (shell script) should add a directory prefix to the key. Also see one comment inline below, marked AndersW> regards, Anders Widell On 01/19/2018 01:47 PM, Hans Nordebäck wrote: Hi Gary, ack, with some comments: - One test case is missing in the implementation sketch, when using TIPC, if some TIPC bearers are disabled and as etcd/raft is using TCP, we should verify this works. AndersW> Split-brain should not be possible, however the current algorithm will not guarantee that the active SC will be in the largest partition in case TIPC connectivity is broken (partitioned). So it could happen that a single isolated node (from TIPC point of view) is the active SC, even though a larger TIPC partition exists. I think this could be solved by writing the size of the cluster into the lock. An existing active SC shall reject a fail-over if it is being initiated from a node in a smaller partition. - The Lock uses TTL, I think the holder of the lock needs to update the TTL until is unlocked, or there is a chance that some parts of the code will run with the lock not taken. - what if consensus_service.MonitorActive threadFunction gets connnection timeout, this thread seems to only log a message and returns. - sleep(3) in MonitorCallback should be changed. - Perhaps we should have a directory structure in etcd instead of using /, e.g /opensaf/clm/opensaf_active_controller and perhaps we should use only one key/value pair with TTL and use test and set. - In Consensus::FenceNode, UINT_MAX is used as argument to opensaf_reboot, do this work? /Thanks HansN On 01/19/2018 12:39 PM, Gary Lee wrote: Summary: Add support for split brain prevention V2 [#64] Review request for Ticket(s): 64 Peer Reviewer(s): Anders, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-64 Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354 Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docs y Build system y RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services y Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - Changes from V1: * fixed most cppcheck/cpplint errors in osaf/consensus * disable self-fencing if remote-fencing is enabled * reboot active controller if it loses quorum (write access) * better error handling revision 7ab1280243311058a6848c4da2b9738ab73dc861 Author: Gary Lee
Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]
Hi Gary, ack, with some comments: - One test case is missing in the implementation sketch, when using TIPC, if some TIPC bearers are disabled and as etcd/raft is using TCP, we should verify this works. - The Lock uses TTL, I think the holder of the lock needs to update the TTL until is unlocked, or there is a chance that some parts of the code will run with the lock not taken. - what if consensus_service.MonitorActive threadFunction gets connnection timeout, this thread seems to only log a message and returns. - sleep(3) in MonitorCallback should be changed. - Perhaps we should have a directory structure in etcd instead of using /, e.g /opensaf/clm/opensaf_active_controller and perhaps we should use only one key/value pair with TTL and use test and set. - In Consensus::FenceNode, UINT_MAX is used as argument to opensaf_reboot, do this work? /Thanks HansN On 01/19/2018 12:39 PM, Gary Lee wrote: Summary: Add support for split brain prevention V2 [#64] Review request for Ticket(s): 64 Peer Reviewer(s): Anders, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-64 Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354 Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docsy Build systemy RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - Changes from V1: * fixed most cppcheck/cpplint errors in osaf/consensus * disable self-fencing if remote-fencing is enabled * reboot active controller if it loses quorum (write access) * better error handling revision 7ab1280243311058a6848c4da2b9738ab73dc861 Author: Gary LeeDate: Fri, 19 Jan 2018 22:29:42 +1100 doc: update README and makefiles [#64] revision 625928304450399548c353473dea631a44aeecbe Author: Gary Lee Date: Fri, 19 Jan 2018 22:28:59 +1100 fmd: update consensus service during controller failover [#64] revision 42539da74893d5ce246242cf8d33c7875ea50fe8 Author: Gary Lee Date: Fri, 19 Jan 2018 22:26:19 +1100 amfd: update consensus service when performing SI swap [#64] When a node goes down and split-brain prevention is enabled, check that we still have write access to the consensus service. If not and fencing is disabled, reboot the node to prevent split brain. revision b6fcd4bede291ba5996b838c3fb784842648581e Author: Gary Lee Date: Fri, 19 Jan 2018 22:23:14 +1100 rded: add split brain prevention support [#64] * consult with consensus service before promoting node to active * add watch thread and self-fence if it detects active controller has been changed (if remote fencing is disabled) revision 656b670a91a10e385604c98366239a28cde925f7 Author: Gary Lee Date: Fri, 19 Jan 2018 22:22:53 +1100 osaf: add consensus API [#64] Added Files: src/osaf/consensus/Makefile src/osaf/consensus/keyvalue.cc src/osaf/consensus/keyvalue.h src/osaf/consensus/plugins/etcd.plugin src/osaf/consensus/plugins/sample.plugin src/osaf/consensus/service.cc src/osaf/consensus/service.h Complete diffstat: -- 00-README.conf | 56 +++ Makefile.am | 4 +- src/amf/amfd/ndproc.cc | 12 +- src/amf/amfd/osaf-amfd.in| 4 + src/amf/amfd/role.cc | 35 +++- src/fm/Makefile.am | 1 + src/fm/fmd/fm_main.cc| 37 - src/fm/fmd/fm_rda.cc | 28 src/fm/fmd/fmd.conf | 8 + src/osaf/Makefile.am | 8 +- src/osaf/consensus/Makefile | 18 ++ src/osaf/consensus/keyvalue.cc | 174 +++ src/osaf/consensus/keyvalue.h| 57 +++ src/osaf/consensus/plugins/etcd.plugin | 220 src/osaf/consensus/plugins/sample.plugin | 163 ++ src/osaf/consensus/service.cc| 277 +++ src/osaf/consensus/service.h | 75 + src/rde/Makefile.am | 3 +- src/rde/rded/osaf-rded.in| 4 + src/rde/rded/rde_cb.h| 3 +- src/rde/rded/rde_main.cc | 35 +++- src/rde/rded/role.cc | 47 +- src/rde/rded/role.h | 2 + 23 files changed, 1248 insertions(+), 23 deletions(-) Testing
[devel] [PATCH 4/5] fmd: update consensus service during controller failover [#64]
--- src/fm/Makefile.am| 1 + src/fm/fmd/fm_main.cc | 37 +++-- src/fm/fmd/fm_rda.cc | 28 src/fm/fmd/fmd.conf | 8 4 files changed, 68 insertions(+), 6 deletions(-) diff --git a/src/fm/Makefile.am b/src/fm/Makefile.am index d48a9146c..0f254b94f 100644 --- a/src/fm/Makefile.am +++ b/src/fm/Makefile.am @@ -49,4 +49,5 @@ bin_osaffmd_SOURCES = \ bin_osaffmd_LDADD = \ lib/libSaAmf.la \ lib/libSaClm.la \ + lib/libosaf_common.la \ lib/libopensaf_core.la diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc index db8395ee7..74517b3b5 100644 --- a/src/fm/fmd/fm_main.cc +++ b/src/fm/fmd/fm_main.cc @@ -28,7 +28,8 @@ This file contains the main() routine for FM. #include #include "base/daemon.h" #include "base/logtrace.h" - +#include "base/osaf_extended_name.h" +#include "osaf/consensus/service.h" #include "nid/agent/nid_api.h" #include "fm.h" #include "base/osaf_time.h" @@ -553,6 +554,8 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) TRACE_ENTER(); switch (fm_mbx_evt->evt_code) { case FM_EVT_NODE_DOWN: + { + Consensus consensus_service; LOG_NO("Current role: %s", role_string[fm_cb->role]); if ((fm_mbx_evt->node_id == fm_cb->peer_node_id)) { /* Check whether node(AMF) initialization is done */ @@ -593,15 +596,27 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) * trigerred quicker than the node_down event * has been received. */ + if (fm_cb->role == PCS_RDA_STANDBY) { + const std::string current_active = consensus_service.CurrentActive(); + if (current_active.compare( + osaf_extended_name_borrow(_cb->peer_node_name)) == 0) { + // update consensus service, before fencing old active controller + consensus_service.DemoteCurrentActive(); + } + } + if (fm_cb->use_remote_fencing) { if (fm_cb->peer_node_terminated == false) { + // if peer_sc_up is true then + // the node has come up already + if (fm_cb->peer_sc_up == false && fm_cb->immnd_down == true) { opensaf_reboot( - fm_cb->peer_node_id, - (char *)fm_cb - ->peer_clm_node_name - .value, - "Received Node Down for peer controller"); + fm_cb->peer_node_id, + (char *)fm_cb + ->peer_clm_node_name.value, + "Received Node Down for peer controller"); + } } else { LOG_NO( "Peer node %s is terminated, fencing will not be performed", @@ -624,6 +639,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) } } } + } break; case FM_EVT_PEER_UP: @@ -659,6 +675,15 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) 0, NULL, "Failover occurred, but this node is not yet ready"); } + + Consensus consensus_service; + const std::string current_active = consensus_service.CurrentActive(); + if (current_active.compare( + osaf_extended_name_borrow(_cb->peer_node_name)) == 0) { + // update consensus service, before fencing old active controller + consensus_service.DemoteCurrentActive(); + } + /* Now. Try resetting other blade */ fm_cb->role = PCS_RDA_ACTIVE; diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc index
[devel] [PATCH 5/5] doc: update README and makefiles [#64]
--- 00-README.conf | 56 Makefile.am | 4 +++- src/osaf/Makefile.am | 8 ++-- 3 files changed, 65 insertions(+), 3 deletions(-) diff --git a/00-README.conf b/00-README.conf index 5de286225..007872d98 100644 --- a/00-README.conf +++ b/00-README.conf @@ -654,3 +654,59 @@ on each node, except on the active node. This file indicates that a cluster reboot is in progress and all nodes needs to delay their start, this to give the active a lead. +Split-Brain Prevention with Consensus Service += + +OpenSAF implements split-brain prevention by utilizing a consensus service that +implements a replicated state machine. The consensus service uses quorum to +prevent state changes in network partitions that don't include more than half +of the nodes in the cluster. In network partitions containing +half of the nodes or less, the state is either read-only or unavailable. +Thus, it is important to keep in mind that the consensus service by itself +does not prevent the presence of multiple active system +controller nodes. In the case when the network has been split up into partitions +and the current active system controller no longer has write access to the +state machine, OpenSAF relies on some additional mechanism like fencing to +ensure that the current active system controller disappears before a new +active system controller can be chosen among the nodes that do have write +access to the replicated state machine. If fencing is not available, the old +active system controller can detect that it has lost write +access and step down from its active role. + +The consensus service can be implemented, for example, using the RAFT algorithm. +When using RAFT, there are mainly three possibilities: + +1. The RAFT servers run on the same nodes as OpenSAF +2. The RAFT servers run on a subset of the OpenSAF nodes +3. The RAFT servers run on an external set of nodes, outside of the + OpenSAF cluster + +The consensus services relies on a plugin to communicate with a distributed +key-value store database. This plugin must still function according to the +API when the network has split up into partitions. +The plugin interface is defined in src/osaf/consensus/plugins/sample.plugin + +An implementation for etcdv2 is provided. It assumes etcd is installed +and configured on all system controllers. In clusters where +there are only two system controllers, it is highly recommended to +configure etcd so it runs on at least three nodes to facilitate +a majority vote with failure tolerance. + +Other implementations of a distributed key-value store service +can be used, provided as it implements the interface documented in sample.plugin + +To enable split-brain prevention, edit fmd.conf and update accordingly: + +export FMS_SPLIT_BRAIN_PREVENTION=1 +export FMS_KEYVALUE_STORE_PLUGIN_CMD=/usr/local/lib/opensaf/etcd.plugin + +As discussed, the key-value store does not need to reside on the same nodes +as OpenSAF. In such a configuration, an appropriate plugin that handles +the communication with a remotely located key-value store, must be provided. + +If remote fencing is enabled, then it will be used to fence a node that the +consensus service believes should not be active. Otherwise, rded/amfd will +initiate a 'self-fencing' by rebooting the node, if it determines the node +should no longer be active according to the consensus service, to prevent +a split-brain situation. + diff --git a/Makefile.am b/Makefile.am index bcfd844cd..57c2585a8 100644 --- a/Makefile.am +++ b/Makefile.am @@ -159,7 +159,9 @@ dist_osaf_execbin_SCRIPTS += \ $(top_srcdir)/scripts/opensaf_reboot \ $(top_srcdir)/scripts/opensaf_sc_active \ $(top_srcdir)/scripts/opensaf_scale_out \ - $(top_srcdir)/scripts/plm_scale_out + $(top_srcdir)/scripts/plm_scale_out \ + $(top_srcdir)/src/osaf/consensus/plugins/etcd.plugin +# TODO remove above line before pushing include $(top_srcdir)/src/ais/Makefile.am include $(top_srcdir)/src/base/Makefile.am diff --git a/src/osaf/Makefile.am b/src/osaf/Makefile.am index 05b78c988..10bbe427b 100644 --- a/src/osaf/Makefile.am +++ b/src/osaf/Makefile.am @@ -16,7 +16,9 @@ noinst_HEADERS += \ src/osaf/immutil/immutil.h \ - src/osaf/saflog/saflog.h + src/osaf/saflog/saflog.h \ + src/osaf/consensus/keyvalue.h \ + src/osaf/consensus/service.h pkglib_LTLIBRARIES += lib/libosaf_common.la @@ -33,7 +35,9 @@ lib_libosaf_common_la_LDFLAGS = \ lib_libosaf_common_la_SOURCES = \ src/osaf/immutil/immutil.c \ - src/osaf/saflog/saflog.c + src/osaf/saflog/saflog.c \ + src/osaf/consensus/keyvalue.cc \ + src/osaf/consensus/service.cc nodist_EXTRA_lib_libosaf_common_la_SOURCES = dummy.cc -- 2.14.1 -- Check out the vibrant tech community on
[devel] [PATCH 3/5] amfd: update consensus service when performing SI swap [#64]
When a node goes down and split-brain prevention is enabled, check that we still have write access to the consensus service. If not and fencing is disabled, reboot the node to prevent split brain. --- src/amf/amfd/ndproc.cc| 12 +++- src/amf/amfd/osaf-amfd.in | 4 src/amf/amfd/role.cc | 35 ++- 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc index 0c6316627..df68b3dbf 100644 --- a/src/amf/amfd/ndproc.cc +++ b/src/amf/amfd/ndproc.cc @@ -32,8 +32,8 @@ */ #include "osaf/immutil/immutil.h" +#include "osaf/consensus/service.h" #include "base/logtrace.h" - #include "amf/amfd/amfd.h" #include "amf/amfd/imm.h" #include "amf/amfd/cluster.h" @@ -1221,5 +1221,15 @@ void avd_node_failover(AVD_AVND *node) { avd_pg_node_csi_del_all(avd_cb, node); avd_node_down_mw_susi_failover(avd_cb, node); avd_node_down_appl_susi_failover(avd_cb, node); + + Consensus consensus_service; + if (consensus_service.IsRemoteFencingEnabled() == false && + consensus_service.IsWritable() == false) { +// remote fencing is disabled and we have lost write access +// reboot this node to prevent split brain +opensaf_reboot(0, nullptr, + "Quorum lost. Rebooting this node to prevent split-brain"); + } + TRACE_LEAVE(); } diff --git a/src/amf/amfd/osaf-amfd.in b/src/amf/amfd/osaf-amfd.in index 45c5ab9e4..26a77ef52 100644 --- a/src/amf/amfd/osaf-amfd.in +++ b/src/amf/amfd/osaf-amfd.in @@ -28,6 +28,10 @@ else . $pkgsysconfdir/amfd.conf fi +if [ -f "$pkgsysconfdir/fmd.conf" ]; then + . "$pkgsysconfdir/fmd.conf" +fi + binary=$pkglibdir/$osafprog pidfile=$pkgpiddir/$osafprog.pid lockfile=$lockdir/$initscript diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc index 865d89d94..aac90eb13 100644 --- a/src/amf/amfd/role.cc +++ b/src/amf/amfd/role.cc @@ -38,6 +38,7 @@ #include "osaf/immutil/immutil.h" #include "base/logtrace.h" #include "rde/agent/rda_papi.h" +#include "osaf/consensus/service.h" #include "amf/amfd/amfd.h" #include "amf/amfd/imm.h" @@ -1085,6 +1086,12 @@ uint32_t amfd_switch_actv_qsd(AVD_CL_CB *cb) { avd_d2n_msg_dequeue(cb); } + Consensus consensus_service; + rc = consensus_service.DemoteThisNode(); + if (rc != SA_AIS_OK) { +LOG_ER("Failed to demote this node from consensus service"); + } + TRACE_LEAVE(); return NCSCC_RC_SUCCESS; } @@ -1209,13 +1216,21 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) { cb->avail_state_avd = SA_AMF_HA_ACTIVE; osaf_mutex_unlock_ordie(_reinit_mutex); + Consensus consensus_service; + rc = consensus_service.BeginActivePromotion(); + if (rc != SA_AIS_OK) { +LOG_ER("Unable to set active controller in consensus service"); +osafassert(false); + } + /* Declare this standby as Active. Set Vdest role role */ if (NCSCC_RC_SUCCESS != (status = avd_mds_set_vdest_role(cb, SA_AMF_HA_ACTIVE))) { LOG_ER("Switch Standby --> Active FAILED, MDS role set failed"); cb->swap_switch = false; avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE); -return NCSCC_RC_FAILURE; +status = NCSCC_RC_FAILURE; +goto done; } /* Time to send fail-over messages to all the AVND's */ @@ -1240,7 +1255,8 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) { } else { cb->swap_switch = false; avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE); - return NCSCC_RC_FAILURE; + status = NCSCC_RC_FAILURE; + goto done; } } @@ -1259,7 +1275,8 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) { in avd_imm_reinit_bg_thread.*/ } else { avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE); - return NCSCC_RC_FAILURE; + status = NCSCC_RC_FAILURE; + goto done; } } else osaf_mutex_unlock_ordie(_reinit_mutex); @@ -1274,7 +1291,8 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) { LOG_ER("Switch Standby --> Active, clm track start failed"); Fifo::queue(new ClmTrackStart()); avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE); -return NCSCC_RC_FAILURE; +status = NCSCC_RC_FAILURE; +goto done; } /* Send the message to other avd for role change rsp as success */ @@ -1291,8 +1309,15 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) { } } + status = NCSCC_RC_SUCCESS; +done: + rc = consensus_service.EndActivePromotion(); + if (rc != SA_AIS_OK) { +LOG_ER("Unable to remove lock in consensus service"); + } + TRACE_LEAVE(); - return NCSCC_RC_SUCCESS; + return status; } /\ -- 2.14.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___
[devel] [PATCH 2/5] rded: add split brain prevention support [#64]
* consult with consensus service before promoting node to active * add watch thread and self-fence if it detects active controller has been changed (if remote fencing is disabled) --- src/rde/Makefile.am | 3 ++- src/rde/rded/osaf-rded.in | 4 src/rde/rded/rde_cb.h | 3 ++- src/rde/rded/rde_main.cc | 35 ++- src/rde/rded/role.cc | 47 ++- src/rde/rded/role.h | 2 ++ 6 files changed, 86 insertions(+), 8 deletions(-) diff --git a/src/rde/Makefile.am b/src/rde/Makefile.am index c967f9fc4..182f347ab 100644 --- a/src/rde/Makefile.am +++ b/src/rde/Makefile.am @@ -58,7 +58,8 @@ bin_osafrded_SOURCES = \ bin_osafrded_LDADD = \ lib/libSaAmf.la \ - lib/libopensaf_core.la + lib/libopensaf_core.la \ + lib/libosaf_common.la bin_rdegetrole_CPPFLAGS = \ $(AM_CPPFLAGS) diff --git a/src/rde/rded/osaf-rded.in b/src/rde/rded/osaf-rded.in index 1c1786c8d..1697936a7 100644 --- a/src/rde/rded/osaf-rded.in +++ b/src/rde/rded/osaf-rded.in @@ -28,6 +28,10 @@ else . $pkgsysconfdir/rde.conf fi +if [ -f "$pkgsysconfdir/fmd.conf" ]; then + . "$pkgsysconfdir/fmd.conf" +fi + binary=$pkglibdir/$osafprog pidfile=$pkgpiddir/$osafprog.pid tracefile=$pkglogdir/$osafprog.log diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h index d2a3d46b2..83f35c691 100644 --- a/src/rde/rded/rde_cb.h +++ b/src/rde/rded/rde_cb.h @@ -45,7 +45,8 @@ enum RDE_MSG_TYPE { RDE_MSG_PEER_UP = 1, RDE_MSG_PEER_DOWN = 2, RDE_MSG_PEER_INFO_REQ = 3, - RDE_MSG_PEER_INFO_RESP = 4 + RDE_MSG_PEER_INFO_RESP = 4, + RDE_MSG_NEW_ACTIVE_CALLBACK = 5 }; struct rde_peer_info { diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index 0298bf3ff..ba472fc6a 100644 --- a/src/rde/rded/rde_main.cc +++ b/src/rde/rded/rde_main.cc @@ -28,6 +28,7 @@ #include #include #include +#include "osaf/consensus/service.h" #include "base/daemon.h" #include "base/logtrace.h" #include "base/osaf_poll.h" @@ -37,6 +38,7 @@ #include #include "rde/rded/rde_cb.h" #include "rde/rded/role.h" +#include "base/conf.h" #define RDA_MAX_CLIENTS 32 @@ -92,10 +94,6 @@ static void handle_mbx_event() { TRACE_ENTER(); msg = reinterpret_cast(ncs_ipc_non_blk_recv(_cb->mbx)); - TRACE("Received %s from node 0x%x with state %s. My state is %s", -rde_msg_name[msg->type], msg->fr_node_id, -Role::to_string(msg->info.peer_info.ha_role), -Role::to_string(role->role())); switch (msg->type) { case RDE_MSG_PEER_INFO_REQ: @@ -118,6 +116,32 @@ static void handle_mbx_event() { case RDE_MSG_PEER_DOWN: LOG_NO("Peer down on node 0x%x", msg->fr_node_id); break; + case RDE_MSG_NEW_ACTIVE_CALLBACK: + { +const std::string my_node = base::Conf::NodeName(); + +// get current active controller +Consensus consensus_service; +std::string active_controller = consensus_service.CurrentActive(); + +LOG_NO("New active controller notification from consensus service"); + +if (role->role() == PCS_RDA_ACTIVE) { + if (my_node.compare(active_controller) != 0) { +// we are meant to be active, but consensus service doesn't think so +LOG_ER("Role does not match consensus service. New controller: %s", + active_controller.c_str()); +if (consensus_service.IsRemoteFencingEnabled() == false ) { + LOG_ER("Probable split brain. Rebooting this node"); + opensaf_reboot(0, nullptr, "Split-brain detected by consensus service"); +} + } + + // register for callback + consensus_service.MonitorActive(Role::MonitorCallback, rde_cb->mbx); +} + } + break; default: LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, msg->type); break; @@ -205,11 +229,12 @@ int main(int argc, char *argv[]) { NCS_SEL_OBJ mbx_sel_obj; RDE_RDA_CB *rde_rda_cb = _cb->rde_rda_cb; int term_fd; - opensaf_reboot_prepare(); daemonize(argc, argv); + base::Conf::InitNodeName(); + if (initialize_rde() != NCSCC_RC_SUCCESS) goto init_failed; mbx_sel_obj = ncs_ipc_get_sel_obj(_cb->mbx); diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc index f7511f0d8..cccbb299f 100644 --- a/src/rde/rded/role.cc +++ b/src/rde/rded/role.cc @@ -27,7 +27,9 @@ #include "base/process.h" #include "base/time.h" #include "base/ncs_main_papi.h" +#include "base/ncssysf_def.h" #include "rde/rded/rde_cb.h" +#include "osaf/consensus/service.h" const char* const Role::role_names_[] = {"Undefined", "ACTIVE","STANDBY", "QUIESCED", "QUIESCING", "Invalid"}; @@ -42,6 +44,23 @@ const char* Role::to_string(PCS_RDA_ROLE role) { : role_names_[0]; } +void Role::MonitorCallback(const std::string& new_value, SYSF_MBX mbx) +{ + TRACE_ENTER(); + +
[devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]
Summary: Add support for split brain prevention V2 [#64] Review request for Ticket(s): 64 Peer Reviewer(s): Anders, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-64 Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354 Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docsy Build systemy RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - Changes from V1: * fixed most cppcheck/cpplint errors in osaf/consensus * disable self-fencing if remote-fencing is enabled * reboot active controller if it loses quorum (write access) * better error handling revision 7ab1280243311058a6848c4da2b9738ab73dc861 Author: Gary LeeDate: Fri, 19 Jan 2018 22:29:42 +1100 doc: update README and makefiles [#64] revision 625928304450399548c353473dea631a44aeecbe Author: Gary Lee Date: Fri, 19 Jan 2018 22:28:59 +1100 fmd: update consensus service during controller failover [#64] revision 42539da74893d5ce246242cf8d33c7875ea50fe8 Author: Gary Lee Date: Fri, 19 Jan 2018 22:26:19 +1100 amfd: update consensus service when performing SI swap [#64] When a node goes down and split-brain prevention is enabled, check that we still have write access to the consensus service. If not and fencing is disabled, reboot the node to prevent split brain. revision b6fcd4bede291ba5996b838c3fb784842648581e Author: Gary Lee Date: Fri, 19 Jan 2018 22:23:14 +1100 rded: add split brain prevention support [#64] * consult with consensus service before promoting node to active * add watch thread and self-fence if it detects active controller has been changed (if remote fencing is disabled) revision 656b670a91a10e385604c98366239a28cde925f7 Author: Gary Lee Date: Fri, 19 Jan 2018 22:22:53 +1100 osaf: add consensus API [#64] Added Files: src/osaf/consensus/Makefile src/osaf/consensus/keyvalue.cc src/osaf/consensus/keyvalue.h src/osaf/consensus/plugins/etcd.plugin src/osaf/consensus/plugins/sample.plugin src/osaf/consensus/service.cc src/osaf/consensus/service.h Complete diffstat: -- 00-README.conf | 56 +++ Makefile.am | 4 +- src/amf/amfd/ndproc.cc | 12 +- src/amf/amfd/osaf-amfd.in| 4 + src/amf/amfd/role.cc | 35 +++- src/fm/Makefile.am | 1 + src/fm/fmd/fm_main.cc| 37 - src/fm/fmd/fm_rda.cc | 28 src/fm/fmd/fmd.conf | 8 + src/osaf/Makefile.am | 8 +- src/osaf/consensus/Makefile | 18 ++ src/osaf/consensus/keyvalue.cc | 174 +++ src/osaf/consensus/keyvalue.h| 57 +++ src/osaf/consensus/plugins/etcd.plugin | 220 src/osaf/consensus/plugins/sample.plugin | 163 ++ src/osaf/consensus/service.cc| 277 +++ src/osaf/consensus/service.h | 75 + src/rde/Makefile.am | 3 +- src/rde/rded/osaf-rded.in| 4 + src/rde/rded/rde_cb.h| 3 +- src/rde/rded/rde_main.cc | 35 +++- src/rde/rded/role.cc | 47 +- src/rde/rded/role.h | 2 + 23 files changed, 1248 insertions(+), 23 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___
[devel] [PATCH 1/5] osaf: add consensus API [#64]
--- src/osaf/consensus/Makefile | 18 ++ src/osaf/consensus/keyvalue.cc | 174 +++ src/osaf/consensus/keyvalue.h| 57 +++ src/osaf/consensus/plugins/etcd.plugin | 220 src/osaf/consensus/plugins/sample.plugin | 163 ++ src/osaf/consensus/service.cc| 277 +++ src/osaf/consensus/service.h | 75 + 7 files changed, 984 insertions(+) create mode 100644 src/osaf/consensus/Makefile create mode 100644 src/osaf/consensus/keyvalue.cc create mode 100644 src/osaf/consensus/keyvalue.h create mode 100644 src/osaf/consensus/plugins/etcd.plugin create mode 100644 src/osaf/consensus/plugins/sample.plugin create mode 100644 src/osaf/consensus/service.cc create mode 100644 src/osaf/consensus/service.h diff --git a/src/osaf/consensus/Makefile b/src/osaf/consensus/Makefile new file mode 100644 index 0..a2c8bc9dd --- /dev/null +++ b/src/osaf/consensus/Makefile @@ -0,0 +1,18 @@ +# -*- OpenSAF -*- +# +# (C) Copyright 2018 The OpenSAF Foundation +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed +# under the GNU Lesser General Public License Version 2.1, February 1999. +# The complete license can be accessed from the following location: +# http://opensource.org/licenses/lgpl-license.php +# See the Copying file included with the OpenSAF distribution for full +# licensing terms. +# +# Author(s): Ericsson AB +# + +all: + $(MAKE) -C ../.. lib/libconsensus.la diff --git a/src/osaf/consensus/keyvalue.cc b/src/osaf/consensus/keyvalue.cc new file mode 100644 index 0..860a8657f --- /dev/null +++ b/src/osaf/consensus/keyvalue.cc @@ -0,0 +1,174 @@ +/* -*- OpenSAF -*- + * + * (C) Copyright 2018 The OpenSAF Foundation + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + * Author(s): Ericsson AB + * + */ +#include "osaf/consensus/keyvalue.h" +#include "base/logtrace.h" +#include "base/getenv.h" +#include "base/conf.h" + +int KeyValue::Execute(const std::string& command, std::string& output) { + TRACE_ENTER(); + constexpr size_t buf_size = 128; + std::arraybuffer; + FILE* pipe = popen(command.c_str(), "r"); + if (pipe == nullptr) { +return 1; + } + output = ""; + while (feof(pipe) == 0) { +if (fgets(buffer.data(), buf_size, pipe) != nullptr) { + output += buffer.data(); +} + } + const int exit_code = pclose(pipe); + if (output.empty() == false && isspace(output.back()) != 0) { +// remove newline at end of output +output.pop_back(); + } + TRACE("Executed '%s', returning %d", command.c_str(), exit_code); + return exit_code; +} + +SaAisErrorT KeyValue::Get(const std::string& key, std::string& value) { + TRACE_ENTER(); + + const std::string kv_store_cmd = base::GetEnv( +"FMS_KEYVALUE_STORE_PLUGIN_CMD", ""); + const std::string command(kv_store_cmd + " get " + key); + int rc = KeyValue::Execute(command, value); + TRACE("Read '%s'", value.c_str()); + + if (rc == 0) { +return SA_AIS_OK; + } else { +return SA_AIS_ERR_FAILED_OPERATION; + } +} + +SaAisErrorT KeyValue::Set(const std::string& key, const std::string& value) { + TRACE_ENTER(); + + const std::string kv_store_cmd = base::GetEnv( +"FMS_KEYVALUE_STORE_PLUGIN_CMD", ""); + const std::string command(kv_store_cmd + " set " + key + " " + value); + std::string output; + int rc = KeyValue::Execute(command, output); + + if (rc == 0) { +return SA_AIS_OK; + } else { +return SA_AIS_ERR_FAILED_OPERATION; + } +} + +SaAisErrorT KeyValue::Erase(const std::string& key) { + TRACE_ENTER(); + + const std::string kv_store_cmd = base::GetEnv( +"FMS_KEYVALUE_STORE_PLUGIN_CMD", ""); + const std::string command(kv_store_cmd + " erase " + key); + std::string output; + int rc = KeyValue::Execute(command, output); + + if (rc == 0) { +return SA_AIS_OK; + } else { +return SA_AIS_ERR_FAILED_OPERATION; + } +} + +SaAisErrorT KeyValue::Lock(const std::string& owner, + const unsigned int timeout) { + TRACE_ENTER(); + + const std::string kv_store_cmd = base::GetEnv( +"FMS_KEYVALUE_STORE_PLUGIN_CMD", ""); + const std::string command(kv_store_cmd + " lock " + owner + " " + +std::to_string(timeout)); + std::string output; + int rc =
Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]
I will update the README file before pushing. Regarding CHASSIS_ID and SUBSLOT_ID: the values 2 and 15 are used as default values only, in the case when /etc/opensaf/chassis_id and /etc/opensaf/subslot_id are missing. If the files are present then the values are configured in the files are used. regards, Anders Widell On 01/19/2018 09:57 AM, Ravi Sekhar Reddy Konda wrote: Hi Anders, Ack, reviewed & tested (combinations also) Update the README(if you are pushing #2759 immediately, you can do after #2759) one minor comment, in generate_nodeid you hardcoded chassis_id & subslot_id + CHASSIS_ID=2 + SUBSLOT_ID=15 I understand we are not taking them as config values, but better to define them in #defines Thanks, Ravi -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: Friday, January 12, 2018 5:54 PM To: Ravi Sekhar Reddy KondaCc: opensaf-devel@lists.sourceforge.net; Anders Widell Subject: [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758] Summary: dtm: Derive Node ID from IPv4 address [#2758] Review request for Ticket(s): 2758 Peer Reviewer(s): Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2758 Base revision: 34a070372ff7cfe3caae7ec4e11a6681e19cdf31 Personal repository: git://git.code.sf.net/u/anders-w/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts y SAF servicesy OpenSAF servicesy Core libraries y Samples n Tests y Other n Comments (indicate scope for each "y" above): - revision c9161898793a57ce1db18be5343b9ea2a372271e Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 nid: Make chassis_id, slot_id and subslot_id in /etc/opensaf optional [#2758] The files chassis_id, slot_id and subslot_id in /etc/opensaf no longer have to be present. When they are missing, OpenSAF will derive the Node ID from the TIPC address or the IPv4 address of the node. revision 2500da3505f42aec23cc51eb0ec58091dadb14ea Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 msg: Allow any unsigned 32-bit value to be used as Node ID [#2758] revision bc8ef895e4eb480416ab841d8c4ce1bf152fad82 Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 lck: Allow any unsigned 32-bit value to be used as Node ID [#2758] revision 629672485cba6356c73ca31d3eea5d475e1b4cf8 Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 clm: Allow any unsigned 32-bit value to be used as Node ID [#2758] Also fix the CLM API tests so that they longer assume that there is a node with node ID 0x2010f in the cluster. revision c10a688016d950ab16d09e08c63519de1c56e449 Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 ckpt: Allow any unsigned 32-bit value to be used as Node ID [#2758] revision 8b56c8eec0da20b95c46772b682b23413e666dfd Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 dtm: Derive Node ID from IPv4 address [#2758] If the /var/lib/opensaf/node_id file doesn't exist when DTM starts, DTM will create the file and use the IPv4 address as node ID. When using IPv6, the file must still be configured manually. IPv6 support may be added in a future ticket. Complete diffstat: -- src/ckpt/apitest/test_cpsv.h| 2 +- src/ckpt/ckptnd/cpnd_res.c | 18 ++--- src/ckpt/common/cpsv_evt.c | 2 +- src/clm/apitest/clmtest.cc | 6 ++ src/clm/apitest/tet_saClmClusterNodeGet.cc | 23 +++--- src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 15 ++-- src/clm/apitest/tet_saClmClusterTrack.cc| 2 +- src/clm/apitest/tet_saClmClusterTrackStop.cc| 4 +- src/clm/apitest/tet_saClmDispatch.cc| 6 +- src/clm/apitest/tet_saClmResponse.cc| 2 +- src/dtm/dtmnd/dtm.h | 1 + src/dtm/dtmnd/dtm_main.cc | 98 - src/dtm/dtmnd/dtm_node_db.cc| 7 ++ src/dtm/dtmnd/dtm_read_config.cc| 6 -- src/lck/apitest/tet_gla.c | 4 +- src/lck/apitest/tet_gld.c | 4 +- src/lck/apitest/tet_glnd.c | 2 +- src/lck/apitest/tet_glsv.h | 2 +- src/msg/apitest/tet_mqa.c | 4 +- src/msg/apitest/tet_mqsv.h | 2 +- src/nid/opensafd.in
Re: [devel] [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762]
ACK from my Side. Thanks, Syam. -Original Message- From: Srinivas [mailto:srinivas.mangip...@oracle.com] Sent: Friday, January 19, 2018 2:18 PM To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; ravisekhar.ko...@oracle.com; syam.tall...@oracle.com Cc: opensaf-devel@lists.sourceforge.net; SrinivasSubject: [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762] Summary: imm: removed extra OpenSafSmfConfig in xml [#2762] Review request for Ticket(s): 2762 Peer Reviewer(s): lennart.l...@ericsson.com , Ravi, Shyam, rafael.odza...@ericsson.com Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2762 Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354 Personal repository: git://git.code.sf.net/u/sri-mangipudy/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files y Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 9ed9e7a38e39d41d3698d5de9dbac6f07417693f Author: Srinivas Date: Fri, 19 Jan 2018 12:51:46 +0530 imm: removed extra OpenSafSmfConfig in xml [#2762] Complete diffstat: -- src/smf/config/smfsv_classes.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testing Commands: - run immxml-configure, extra OpenSafSmfConfig should not be present in the resultant xml. Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list
Re: [devel] [PATCH 0/5] Review Request for smf: Add capability to redo CCBs that fail [#1398]
Hi Thanks for doing an early review. I have started to create a test application for this IMM object handler. It will be located in the same directory as the demo applications and being built the same way. The test application will make it possible to test this new handler independently of SMF. I have found some problems that I am fixing. The most seriously is that in the lower layer API (imm_om_api) handles are given when the objects are created and cannot be restored without creating a new object. This means that recovery if for example BAD_HANDLE does not work. It is also means that a new ModelModification object has to be created for each CCB. Also, see below for some comments/answers regarding the points below. Thanks Lennart > -Original Message- > From: Syam Prasad Talluri [mailto:syam.tall...@oracle.com] > Sent: den 18 januari 2018 09:30 > To: Lennart Lund; Vijay Roy > > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [devel] [PATCH 0/5] Review Request for smf: Add capability to > redo CCBs that fail [#1398] > > Hi Lennart, > > I understand you are planning to send some more patches. Here are my > initial observations > > 1. In om_handle.cc we are reading the env variable > IMMOMCPP_TRACE_PATHNAME, we need to add about this in the ReadMe > file or remove this env [Lennart] Good finding, this has to be removed > 2. Like immccb operations do we need to implement wrappers for > saImmOmAccessorGet_2, saImmOmAdminOperationInvoke_2 which are > used by SMF. [Lennart] I agree, since we have the same problem with recovery handling etc. also here. However even if handlers for these OM APIs should be easier to implement and some of the code created for CCB handling is common it is projects that probably has to be handled one by one. > 3. Wrappers are added for IMM OM, do we need to have same kind of > wrapper for IMM OI implementation as well ? [Lennart] Yes, that would be really good to do but it will be more complicated and a bigger "project" than OM handling. In the log service there is a handler for the configuration object that is based on some ideas that can be used for an OI handler > 4. In om_handle.h is there any special reason to keep the below functions > void SetDummy(int i) { dummy_ = 1; } > int GetDummy(void) { return dummy_; } [Lennart] No, I have removed them > 5. There are TODO's in some files, I hope you will address them before final > commit. [Lennart] Yes. There is also a lot of LLDTEST tagged logs, traces and even code that also will be handled > 6. Looks like Wrappers implemented are generic Hence we can use these in > other services also and these can be moved to a common place going > forward. [Lennart] Yes, it is generic but let's use it with SMF first. If it works well it can be moved to a generic place and become library code. This also requires that the lower layer API wrappers are made generic library code. > > We will try to integrate the existing SMF OM Interface with these Wrappers > . > > > Thanks, > Syam. > -Original Message- > From: Lennart Lund [mailto:lennart.l...@ericsson.com] > Sent: Tuesday, January 16, 2018 8:11 PM > To: Vijay Roy > Cc: opensaf-devel@lists.sourceforge.net > Subject: [devel] [PATCH 0/5] Review Request for smf: Add capability to redo > CCBs that fail [#1398] > > Summary: smf: Add capability to redo CCBs that fail [#1398] Review request > for Ticket(s): 1398 Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / > MAINTAINER(S) HERE *** Pull request to: *** LIST THE PERSON WITH PUSH > ACCESS HERE *** Affected branch(es): develop Development branch: ticket- > 1398 Base revision: d082d2fb2b26437bbe6860e8efff8479748378c2 > Personal repository: git://git.code.sf.net/u/elunlen/review > > > Impacted area Impact y/n > > Docsn > Build systemn > RPM/packaging n > Configuration files n > Startup scripts n > SAF servicesy > OpenSAF servicesn > Core libraries n > Samples n > Tests n > Other n > > NOTE: Patch(es) contain lines longer than 80 characers > > Comments (indicate scope for each "y" above): > - > Note: > Logging and tracing is still tagged LLDTEST. This will be removed For testing > use the demo programs as before > > revision 011fc764956ff8892f577b44a7174384c85cc878 > Author: Lennart Lund > Date: Tue, 16 Jan 2018 15:28:10 +0100 > > smf: Add capability to redo CCBs that fail [#1398] > > Object modify is added > ccbdemo_modify updated with demo code for object modify > > Also some cleanup and bug fixes > > > > revision 5c2931adb8b8f5d030b869a48e0fcb91ceeda3ac > Author: Lennart Lund
Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]
Hi Anders, Ack, reviewed & tested (combinations also) Update the README(if you are pushing #2759 immediately, you can do after #2759) one minor comment, in generate_nodeid you hardcoded chassis_id & subslot_id + CHASSIS_ID=2 + SUBSLOT_ID=15 I understand we are not taking them as config values, but better to define them in #defines Thanks, Ravi -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: Friday, January 12, 2018 5:54 PM To: Ravi Sekhar Reddy KondaCc: opensaf-devel@lists.sourceforge.net; Anders Widell Subject: [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758] Summary: dtm: Derive Node ID from IPv4 address [#2758] Review request for Ticket(s): 2758 Peer Reviewer(s): Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2758 Base revision: 34a070372ff7cfe3caae7ec4e11a6681e19cdf31 Personal repository: git://git.code.sf.net/u/anders-w/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts y SAF servicesy OpenSAF servicesy Core libraries y Samples n Tests y Other n Comments (indicate scope for each "y" above): - revision c9161898793a57ce1db18be5343b9ea2a372271e Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 nid: Make chassis_id, slot_id and subslot_id in /etc/opensaf optional [#2758] The files chassis_id, slot_id and subslot_id in /etc/opensaf no longer have to be present. When they are missing, OpenSAF will derive the Node ID from the TIPC address or the IPv4 address of the node. revision 2500da3505f42aec23cc51eb0ec58091dadb14ea Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 msg: Allow any unsigned 32-bit value to be used as Node ID [#2758] revision bc8ef895e4eb480416ab841d8c4ce1bf152fad82 Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 lck: Allow any unsigned 32-bit value to be used as Node ID [#2758] revision 629672485cba6356c73ca31d3eea5d475e1b4cf8 Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 clm: Allow any unsigned 32-bit value to be used as Node ID [#2758] Also fix the CLM API tests so that they longer assume that there is a node with node ID 0x2010f in the cluster. revision c10a688016d950ab16d09e08c63519de1c56e449 Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 ckpt: Allow any unsigned 32-bit value to be used as Node ID [#2758] revision 8b56c8eec0da20b95c46772b682b23413e666dfd Author: Anders Widell Date: Fri, 12 Jan 2018 13:08:48 +0100 dtm: Derive Node ID from IPv4 address [#2758] If the /var/lib/opensaf/node_id file doesn't exist when DTM starts, DTM will create the file and use the IPv4 address as node ID. When using IPv6, the file must still be configured manually. IPv6 support may be added in a future ticket. Complete diffstat: -- src/ckpt/apitest/test_cpsv.h| 2 +- src/ckpt/ckptnd/cpnd_res.c | 18 ++--- src/ckpt/common/cpsv_evt.c | 2 +- src/clm/apitest/clmtest.cc | 6 ++ src/clm/apitest/tet_saClmClusterNodeGet.cc | 23 +++--- src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 15 ++-- src/clm/apitest/tet_saClmClusterTrack.cc| 2 +- src/clm/apitest/tet_saClmClusterTrackStop.cc| 4 +- src/clm/apitest/tet_saClmDispatch.cc| 6 +- src/clm/apitest/tet_saClmResponse.cc| 2 +- src/dtm/dtmnd/dtm.h | 1 + src/dtm/dtmnd/dtm_main.cc | 98 - src/dtm/dtmnd/dtm_node_db.cc| 7 ++ src/dtm/dtmnd/dtm_read_config.cc| 6 -- src/lck/apitest/tet_gla.c | 4 +- src/lck/apitest/tet_gld.c | 4 +- src/lck/apitest/tet_glnd.c | 2 +- src/lck/apitest/tet_glsv.h | 2 +- src/msg/apitest/tet_mqa.c | 4 +- src/msg/apitest/tet_mqsv.h | 2 +- src/nid/opensafd.in | 49 ++--- 21 files changed, 149 insertions(+), 110 deletions(-) Testing Commands: - Start OpenSAF with IPv4 TCP transport, without configuring /etc/opensaf/slot_id or /var/lib/opensaf/node_id Testing, Expected Results: -- OpenSAF shall use the IPv4 address as Node ID. Conditions of Submission: - Ack from
[devel] [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762]
Summary: imm: removed extra OpenSafSmfConfig in xml [#2762] Review request for Ticket(s): 2762 Peer Reviewer(s): lennart.l...@ericsson.com , Ravi, Shyam, rafael.odza...@ericsson.com Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2762 Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354 Personal repository: git://git.code.sf.net/u/sri-mangipudy/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files y Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 9ed9e7a38e39d41d3698d5de9dbac6f07417693f Author: SrinivasDate: Fri, 19 Jan 2018 12:51:46 +0530 imm: removed extra OpenSafSmfConfig in xml [#2762] Complete diffstat: -- src/smf/config/smfsv_classes.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testing Commands: - run immxml-configure, extra OpenSafSmfConfig should not be present in the resultant xml. Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel